4 research outputs found
Deriving a Representative Vector for Ontology Classes with Instance Word Vector Embeddings
Selecting a representative vector for a set of vectors is a very common
requirement in many algorithmic tasks. Traditionally, the mean or median vector
is selected. Ontology classes are sets of homogeneous instance objects that can
be converted to a vector space by word vector embeddings. This study proposes a
methodology to derive a representative vector for ontology classes whose
instances were converted to the vector space. We start by deriving five
candidate vectors which are then used to train a machine learning model that
would calculate a representative vector for the class. We show that our
methodology out-performs the traditional mean and median vector
representations
Semi-Supervised Instance Population of an Ontology using Word Vector Embeddings
In many modern day systems such as information extraction and knowledge
management agents, ontologies play a vital role in maintaining the concept
hierarchies of the selected domain. However, ontology population has become a
problematic process due to its nature of heavy coupling with manual human
intervention. With the use of word embeddings in the field of natural language
processing, it became a popular topic due to its ability to cope up with
semantic sensitivity. Hence, in this study, we propose a novel way of
semi-supervised ontology population through word embeddings as the basis. We
built several models including traditional benchmark models and new types of
models which are based on word embeddings. Finally, we ensemble them together
to come up with a synergistic model with better accuracy. We demonstrate that
our ensemble model can outperform the individual models
Synergistic Union of Word2Vec and Lexicon for Domain Specific Semantic Similarity
Semantic similarity measures are an important part in Natural Language
Processing tasks. However Semantic similarity measures built for general use do
not perform well within specific domains. Therefore in this study we introduce
a domain specific semantic similarity measure that was created by the
synergistic union of word2vec, a word embedding method that is used for
semantic similarity calculation and lexicon based (lexical) semantic similarity
methods. We prove that this proposed methodology out performs word embedding
methods trained on generic corpus and methods trained on domain specific corpus
but do not use lexical semantic similarity methods to augment the results.
Further, we prove that text lemmatization can improve the performance of word
embedding methods.Comment: 6 Pages, 3 figure
Legal Document Retrieval using Document Vector Embeddings and Deep Learning
Domain specific information retrieval process has been a prominent and
ongoing research in the field of natural language processing. Many researchers
have incorporated different techniques to overcome the technical and domain
specificity and provide a mature model for various domains of interest. The
main bottleneck in these studies is the heavy coupling of domain experts, that
makes the entire process to be time consuming and cumbersome. In this study, we
have developed three novel models which are compared against a golden standard
generated via the on line repositories provided, specifically for the legal
domain. The three different models incorporated vector space representations of
the legal domain, where document vector generation was done in two different
mechanisms and as an ensemble of the above two. This study contains the
research being carried out in the process of representing legal case documents
into different vector spaces, whilst incorporating semantic word measures and
natural language processing techniques. The ensemble model built in this study,
shows a significantly higher accuracy level, which indeed proves the need for
incorporation of domain specific semantic similarity measures into the
information retrieval process. This study also shows, the impact of varying
distribution of the word similarity measures, against varying document vector
dimensions, which can lead to improvements in the process of legal information
retrieval